Goto

Collaborating Authors

 constant approximation



Constant Approximation for Individual Preference Stable Clustering

Neural Information Processing Systems

Individual preference (IP) stability, introduced by Ahmadi et al. (ICML 2022), is a natural clustering objective inspired by stability and fairness constraints. A clustering is $\alpha$-IP stable if the average distance of every data point to its own cluster is at most $\alpha$ times the average distance to any other cluster. Unfortunately, determining if a dataset admits a $1$-IP stable clustering is NP-Hard. Moreover, before this work, it was unknown if an $o(n)$-IP stable clustering always exists, as the prior state of the art only guaranteed an $O(n)$-IP stable clustering. We close this gap in understanding and show that an $O(1)$-IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering. We also introduce generalizations of IP stability beyond average distance and give efficient near optimal algorithms in the cases where we consider the maximum and minimum distances within and between clusters.


Dynamic Fair Division with Partial Information

Neural Information Processing Systems

We consider the fundamental problem of fairly and efficiently allocating $T$ indivisible items among $n$ agents with additive preferences. The items become available over a sequence of rounds, and every item must be allocated immediately and irrevocably before the next one arrives. Previous work shows that when the agents' valuations for the items are drawn from known distributions, it is possible (under mild technical assumptions) to find allocations that are envy-free with high probability and Pareto efficient ex-post. We study a \emph{partial-information} setting, where it is possible to elicit ordinal but not cardinal information. When a new item arrives, the algorithm can query each agent for the relative rank of this item with respect to a subset of the past items. When values are drawn from i.i.d.\ distributions, we give an algorithm that is envy-free and $(1-\epsilon)$-welfare-maximizing with high probability. We provide similar guarantees (envy-freeness and a constant approximation to welfare with high probability) even with minimally expressive queries that ask for a comparison to a single previous item. For independent but non-identical agents, we obtain envy-freeness and a constant approximation to Pareto efficiency with high probability. We prove that all our results are asymptotically tight.



Constant Approximation for Individual Preference Stable Clustering

Neural Information Processing Systems

Individual preference (IP) stability, introduced by Ahmadi et al. (ICML 2022), is a natural clustering objective inspired by stability and fairness constraints. A clustering is \alpha -IP stable if the average distance of every data point to its own cluster is at most \alpha times the average distance to any other cluster. Unfortunately, determining if a dataset admits a 1 -IP stable clustering is NP-Hard. Moreover, before this work, it was unknown if an o(n) -IP stable clustering always exists, as the prior state of the art only guaranteed an O(n) -IP stable clustering. We close this gap in understanding and show that an O(1) -IP stable clustering always exists for general metrics, and we give an efficient algorithm which outputs such a clustering.


Dynamic Fair Division with Partial Information

Neural Information Processing Systems

We consider the fundamental problem of fairly and efficiently allocating T indivisible items among n agents with additive preferences. The items become available over a sequence of rounds, and every item must be allocated immediately and irrevocably before the next one arrives. Previous work shows that when the agents' valuations for the items are drawn from known distributions, it is possible (under mild technical assumptions) to find allocations that are envy-free with high probability and Pareto efficient ex-post. We study a \emph{partial-information} setting, where it is possible to elicit ordinal but not cardinal information. When a new item arrives, the algorithm can query each agent for the relative rank of this item with respect to a subset of the past items.


Reviews: Hierarchical Clustering Beyond the Worst-Case

Neural Information Processing Systems

This paper studies the problem of hierarchical clustering in a beyond-the-worst-case setting, where the data is a random graph generated by a Hierarchical Stochastic Block Model (HSBM). It proposes an SVD linkage algorithm and proves that in certain regimes it returns a solution that is a constant approximation to the cost function of hierarchical clustering proposed by Dasgupta. It also considers an algorithm that combines the SDP relaxation approach and recursive sparsest cut approach in previous works to get a constant approximation in larger regimes. Roughly speaking, the HSBM model considered assumes some underlying tree T* over k leaves (each leaf corresponding to a bottom cluster containing a certain number of vertices). Each node in the tree has some weight in (0,1) and a node's weight should not be larger than its descendant's.


An Objective for Hierarchical Clustering in Euclidean Space and its Connection to Bisecting K-means

arXiv.org Machine Learning

This paper explores hierarchical clustering in the case where pairs of points have dissimilarity scores (e.g. distances) as a part of the input. The recently introduced objective for points with dissimilarity scores results in every tree being a 1/2 approximation if the distances form a metric. This shows the objective does not make a significant distinction between a good and poor hierarchical clustering in metric spaces. Motivated by this, the paper develops a new global objective for hierarchical clustering in Euclidean space. The objective captures the criterion that has motivated the use of divisive clustering algorithms: that when a split happens, points in the same cluster should be more similar than points in different clusters. Moreover, this objective gives reasonable results on ground-truth inputs for hierarchical clustering. The paper builds a theoretical connection between this objective and the bisecting k-means algorithm. This paper proves that the optimal 2-means solution results in a constant approximation for the objective. This is the first paper to show the bisecting k-means algorithm optimizes a natural global objective over the entire tree.


Optimization from Structured Samples for Coverage Functions

arXiv.org Machine Learning

We revisit the optimization from samples (OPS) model, which studies the problem of optimizing objective functions directly from the sample data. Previous results showed that we cannot obtain a constant approximation ratio for the maximum coverage problem using polynomially many independent samples of the form $\{S_i, f(S_i)\}_{i=1}^t$ (Balkanski et al., 2017), even if coverage functions are $(1 - \epsilon)$-PMAC learnable using these samples (Badanidiyuru et al., 2012), which means most of the function values can be approximately learned very well with high probability. In this work, to circumvent the impossibility result of OPS, we propose a stronger model called optimization from structured samples (OPSS) for coverage functions, where the data samples encode the structural information of the functions. We show that under three general assumptions on the sample distributions, we can design efficient OPSS algorithms that achieve a constant approximation for the maximum coverage problem. We further prove a constant lower bound under these assumptions, which is tight when not considering computational efficiency. Moreover, we also show that if we remove any one of the three assumptions, OPSS for the maximum coverage problem has no constant approximation.


The Canonical Distortion Measure for Vector Quantization and Function Approximation

arXiv.org Machine Learning

To measure the quality of a set of vector quantization points a means of measuring the distance between a random point and its quantization is required. Common metrics such as the {\em Hamming} and {\em Euclidean} metrics, while mathematically simple, are inappropriate for comparing natural signals such as speech or images. In this paper it is shown how an {\em environment} of functions on an input space $X$ induces a {\em canonical distortion measure} (CDM) on X. The depiction 'canonical" is justified because it is shown that optimizing the reconstruction error of X with respect to the CDM gives rise to optimal piecewise constant approximations of the functions in the environment. The CDM is calculated in closed form for several different function classes. An algorithm for training neural networks to implement the CDM is presented along with some encouraging experimental results.